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1  SUMMARY 


The  main  achievement  of  the  project  has  been  obtained  by  a  set  of  software  tools  that 
demonstrate  the  successful  detection  of  colluding  team  cyber  insider  attacks  in  various 
dimensions  (time,  space,  value)  through  fine-grained  monitoring  of  document  access 
sessions  and  statistical  analysis  of  combinations  of  apparently  innocuous  access 
sessions.  The  demonstration  has  been  achieved  through  the  observables  and 
evaluation  metrics  detailed  in  the  following  subsections. 

The  software  tools  include:  a  document  storage  and  access  metadata  storage  system 
(the  PPS),  a  document  access  monitoring  tool  (a  Firefox  plug-in),  document  translation 
tools  (e.g.,  PDF  to  HTML),  and  an  attack  generator  (Malware  Test  Platform)  that  can 
generate  various  forms  of  cyber  insider  attack,  including  the  various  forms  of  collusion. 
Using  these  software  tools,  we  demonstrated  the  effective  detection  of  stealthy  cyber 
insider  attacks  through  statistical  analysis  tools  such  as  various  correlation  analyses 
methods. 

2  Introduction 

2.1  Purpose  of  Report  and  Project 

This  final  report  is  a  deliverable  summarizing  the  effort  expended  by  the  Georgia 
Institute  of  Technology  (GT)  team  in  support  of  AFRL  and  DARPA  on  Contract  FA8750- 
1 1-C-0136.  Part  of  the  Cyber  Insider  Threat  (CINDER)  program.  The  project  Principal 
Investigator  is  Prof.  Calton  Pu. 

The  objective  of  this  project  is  to  develop  a  prototype  system  for  detecting  document 
access  activities  that  indicate  cyber  insider  attacks,  including  collusion  among  multiple 
cyber  attackers.  The  attacks  covered  range  from  indiscriminate  bulk  read/bulk  copy  to 
the  detection  of  evasive  techniques  such  as  low  intensity,  fine  granularity,  and  high 
value/low  volume  access  of  sensitive  document  content  by  cyber  attacker  software.  At 
the  basic  level  (called  Generation  1  in  this  project),  indiscriminate  accesses  by  cyber 
attackers  is  through  controlled  access  APIs  by  the  customized  PPS  document  storage 
system.  This  is  analogous  to  some  existing  tools  (e.g.,  DLP  software)  that  work  on 
network  traffic  and  standard  file  systems.  More  sophisticated  cyber  insider  attacks 
(called  Generation  2  in  this  project)  that  combine  apparently  normal  access  sessions 
are  detected  through  statistical  analyses  of  correlation  to  valuable  document  content. 
This  is  achieved  by  our  software  tools  to  monitor  all  access  sessions  (from  PPS)  and 
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correlation  analysis  of  document  access  sessions  by  potentially  colluding  cyber  insider 
attackers. 

2.2  Mission  Description 

Evasive  Cyber  Insider  Mission  -  Our  mission  of  interest  lies  beyond  the  Generation  1 
cyber  insider  malwares  that  employ  bulk  access  tactics.  These  can  already  be  detected 
through  existing  intrusion  detection  techniques  that  rely  on  threshold-based  filtering  and 
analysis  of  outbound  information-flows,  e.g.,  Data  Loss  Prevention  (DLP)  tools  such  as 
OpenDLP. 

Beyond  bulk  access,  we  anticipate  evasive  cyber  insider  attacks  (Generation  2)  that  will 
use  several  strategies  to  escape  generic  thresholds  on  coarse-grain  metrics.  We  will 
investigate  primarily  three  evasion  strategies  (and  their  combinations)  that  can  be  used 
by  individual  cyber  insiders  or  cyber  insider  teams  working  in  collusion:  time  division, 
space  division,  and  value-driven.  First,  time  division  reduces  the  intensity  of  detectable 
activity  by  separating  periods  of  activity  in  time.  Second,  space  division  reduces  the 
granularity  of  detectable  activity  by  dividing  the  work  among  various  members  of  a 
colluding  team.  Third,  the  value-driven  approach  takes  advantage  of  insider  attack 
semantics  (stealing  of  information)  by  reducing  the  total  amount  of  detectable  activity 
and  accessing  only  high  value  parts  of  a  document. 

2.3  Project  Members 


Table  1  List  of  Project  Members 


Member  Name 

Position 

Duration  of  Work 

Pu,  Calton 

Principal  Investigator 

03/11  -06/12 

Liu,  Ling 

Technical  Lead 

04/1 1  -  09/1 1 

Grayson,  Christopher 

Research  Scientist 

04/11  -06/12 

Suh,  James 

Application  Developer 

08/11  -04/12 

Malkowski,  Simon 

Graduate  Research 
Assistant  and  Project 
Manager 

04/11  -04/12 

Doo,  Myungcheol 

Graduate  Research 
Assistant 

08/11  -04/12 

Kisung,  Lee 

Graduate  Research 
Assistant 

08/11  -04/12 

Wang,  De 

Graduate  Research 
Assistant 

08/11  -04/12 

Yang,  Yi 

Graduate  Research 
Assistant 

08/11  -04/12 

Park,  Junhee 

Graduate  Research 
Assistant 

05/11  -04/12 
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Wu,  Qinyi 

Graduate  Research 
Assistant 

05/1 1  -  07/1 1 

Mohan,  Paritosh 

Graduate  Research 
Assistant 

08/11  -04/12 

3  Methods,  Assumptions,  and  Procedures 

3.1  Technical  Objectives  and  Methods 

The  technical  deliverables  of  the  project  are  divided  into  three  main  components: 

1 .  The  document  access  infrastructure,  including  the  PPS  (partial  persistent 
sequence)  API  and  a  customized  relational  database  implementation  of  PPS  for 
this  project.  This  is  discussed  in  Section  4.1. 

2.  The  insider  access  monitoring  and  detection  mechanisms  for  HTML  and  TXT 
documents  stored  in  PPS  and  accessed  through  a  web  browser,  specifically,  the 
Firefox,  and  translators  for  documents  stored  in  other  formats  such  as  PDF.  This 
is  discussed  in  Section  4.2. 

3.  The  MTP  (Malware  Test  Platform),  a  powerful  document  access  generator  that 
simulates  a  variety  of  insider  accesses  to  documents.  This  is  discussed  in 
Section  4.3. 

The  progress  made  on  these  deliverables  is  measured  by  the  concrete  observables 
listed  in  Section  4.5  and  evaluated  according  to  the  concrete  metrics  listed  in  Section 
4.6.  These  concrete  observables  and  metrics  are  a  subset  of  and  derived  from  the 
original  observables  and  evaluation  metrics  developed  during  the  initial  period  of 
performance,  in  collaboration  with  the  CINDER  program  independent  evaluators  at  MIT 
Lincoln  Labs.  As  the  program  progressed  and  focused  specifically  on  cyber  insider 
activities,  the  project  focused  on  the  concrete  observables  and  metrics  listed  in  Sections 
4.5  and  4.6. 


3.2  Classification  of  Cyber  Insider  Threats  (Assumptions] 

We  use  an  adversarial  learning-based  classification  of  cyber  insider  threats  to  put  our 
project  in  perspective.  The  simplest  category  of  cyber  insider  attacks,  called  Generation 
1 ,  consists  of  bulk  transfer  of  data  that  can  be  detected  using  current  DLP  tools. 
Generation  1  cyber  insider  attacks  are  oblivious  to  defense  tools. 
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The  next  category,  called  Generation  2,  consists  of  cyber  insider  attacks  that  are  aware 
of  threshold-based  defense  tools  such  as  DLP  and  intrusion  detection  software.  They 
maintain  a  low  profile  by  staying  below  the  threshold  set  by  DLP  and  intrusion  detection 
tools.  There  are  various  techniques  for  staying  below  the  threshold,  which  are  modeled 
as  collusion  in  this  project.  A  Generation  2  cyber  insider  attack  may  use  time  division 
(accessing  a  document  over  a  period  of  time  through  several  short  sub-sessions  by  the 
same  insider),  space  division  (accessing  a  document  through  several  sub-sessions  by 
different  insiders),  and  value-based  (reading  the  document  in  a  set  of  sub-sessions  by 
one  insider,  and  copying  only  the  valuable  parts  of  the  document  in  a  short  sub-session 
by  another  insider).  This  project  is  focused  on  Generation  2  cyber  insider  attacks. 

Further  evolution  of  cyber  insider  attacks  include  the  emulation  of  human  reading 
patterns  to  escape  detection  mechanisms  that  attempt  to  distinguish  human  access 
from  cyber  tool  access.  Due  to  time  constraints  this  project  did  not  explore  the  cyber 
insider  attacks  that  emulate  human  access  patterns  (Generation  3). 

3.3  Summary  of  Project  Achievements 

The  main  achievement  of  the  project  has  been  obtained  by  a  set  of  software  tools  that 
demonstrate  the  successful  detection  of  colluding  team  cyber  insider  attacks  in  various 
dimensions  (time,  space,  value)  through  fine-grained  monitoring  of  document  access 
sessions  and  statistical  analysis  of  combinations  of  apparently  innocuous  access 
sessions.  The  demonstration  has  been  achieved  through  the  concrete  observables  and 
evaluation  metrics  detailed  in  Sections  4.5  and  4.6. 

The  software  tools  include:  a  document  storage  and  access  metadata  storage  system 
(the  PPS),  a  document  access  monitoring  tool  (a  Firefox  plug-in),  document  translation 
tools  (e.g.,  PDF  to  HTML),  and  an  attack  generator  (Malware  Test  Platform)  that  can 
generate  various  forms  of  cyber  insider  attack,  including  the  various  forms  of  collusion. 
Using  these  software  tools,  we  demonstrated  the  effective  detection  of  stealthy  cyber 
insider  attacks  through  statistical  analysis  tools  such  as  various  correlation  analyses 
methods. 


4  Results  and  Discussion 

The  project’s  main  results  are  software  tools  described  in  Sections  4.1, 4.2,  and  4.3. 
The  results  are  evaluated  and  discussed  in  Sections  4.4,  4.5,  and  4.6. 
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4.1  PPS  Infrastructure 

We  have  designed  and  implemented  a  customized  version  of  the  PPS  (Partial 
Persistent  Sequences)  representation  of  documents.  This  version  of  PPS  can  be 
described  as  a  customized  file  system  with  significant  metadata  collection  and  querying 
capabilities  that  are  absent  in  typical  file  systems.  At  an  abstract  level,  PPS  collects  all 
data  access  information  (for  both  reads  and  writes)  at  fine  granularity  (currently  at 
cursor  and  mouse  pointer  resolution  level)  and  store  it  with  the  relevant  document 
component. 

From  a  data  structure  point  of  view,  PPS  is  implemented  as  a  tree,  where  top  level 
nodes  contain  the  actual  words  of  a  document,  and  the  children  nodes  of  each  word 
store  its  access  information.  Concretely,  we  store  both  the  data  and  metadata  in  a 
relational  DBMS  (currently  MySQL). 

The  PPS  provides  four  major  functions.  First,  it  supports  read/write  of  document  data. 
Second,  it  monitors  and  records  all  accesses  to  document  data.  Third,  the  access 
monitor  can  detect  anomalous  (bot  program)  access  and  block  them  immediately. 
Fourth,  PPS  provides  a  simple  query  facility  for  the  metadata,  so  we  can  extract 
appropriate  information  for  further  analysis. 

4.2  Monitoring  and  Detection  Tools 

While  the  PPS  infrastructure  benefited  significantly  from  previous  work  (it  builds  on 
Qingyi  Wu’s  dissertation,  which  was  completed  in  summer  2011),  the  access  monitoring 
tools  were  built  from  ground  up  specifically  for  this  project.  These  tools  consist  of  a  GUI, 
a  logging  facility  to  capture  document  access  metadata,  and  various  statistical  analysis 
tools. 

For  the  GUI,  we  have  chosen  the  web  browser  interface,  since  it  is  the  most  popular 
GUI  for  many  applications,  including  document  access.  Our  implementation  is  a  Firefox 
plug-in  (Javascript  implementation)  for  HTML  documents,  plus  translators  for  TXT 
(easily  translated  into  HTML)  and  PDF  (somewhat  more  involved  translation).  A  parallel 
project  (funded  by  other  sources  at  George  Mason  University)  built  similar  monitoring 
tools  for  Microsoft  Word. 

The  Javascript  implementation  of  Firefox  plug-in  has  been  used  by  more  than  50 
students  in  Prof.  Pu’s  classes  in  Fall  201 1  and  Spring  2012  semesters  for  their  paper 
reading  assignments.  (This  work  was  carried  out  after  suitable  IRB  approval.  We  did  not 
collect  any  participant  identification  data,  but  assume  that  each  student  used  his/her 
own  account  for  access.)  Their  reading  access  patterns  have  been  recorded  and  they 
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are  being  analyzed  for  a  paper  on  the  experimental  analysis  of  real  world  document 
access  patterns. 

The  initial  implementation  of  the  Firefox  plug-in  (Javascript  implementation)  was  able  to 
capture  the  time  spent  on  a  document  at  the  granularity  of  approximately  a  paragraph. 
This  metadata  is  collected  into  the  PPS  backend  database  for  subsequent  statistical 
analysis.  The  granularity  of  capture  was  refined  to  a  line  and  further  refinements  are 
feasible,  with  some  additional  overhead. 

The  Firefox  GUI  read  access  monitor  collects  access  statistics  and  sends  data  to  the 
PPS  database  at  a  tunable  time  interval  (usually  set  at  20  seconds).  At  this  interval,  our 
access  monitor  is  virtually  undetectable  by  human  users  and  has  very  low  overhead  (a 
few  percent  CPU  usage  on  a  PC).  If  additional  overhead  is  acceptable,  fine-grained 
real-time  data  collection  may  enable  the  real-time  detection  of  insider  activity. 

The  GUI  displays  the  analysis  results  both  as  a  graph  and  a  heatmap  for  the 
visualization  of  document  reading  behavior.  The  display  part  of  the  statistical  analysis 
tool  reproduces  the  graph  and  heatmap  visualization  of  the  plug-in.  The  GUI  includes 
the  collusion  analysis  that  combine  several  sub-sessions  for  visualization  and 
subsequent  statistical  analysis  to  detect  collusion  among  various  sub-sessions. 

The  development  of  statistical  analysis  tools  contributed  to  finding  collusion  in  stealthy 
cyber  insider  attacks.  This  is  facilitated  by  displaying  the  data  in  visually  meaningful 
ways  (e.g.,  graphs  and  heatmaps)  and  correlate  reading  access  data  with  functional 
models  of  fine-grain  read  access  for  each  document  that  may  indicate  insider  activity. 

Specific  objectives  achieved: 

1 .  Sensors:  the  Firefox  plug-in  (and  the  Javascript  implementation)  as  the  first 
reader/finder  recorders,  to  record  the  time  a  reader  (including  cyber  insider 
attackers)  spends  at  each  point  of  the  document. 

2.  Test  data  set:  a  selection  of  open  documents  from  various  sources  (e.g., 
Wikipedia)  for  testing  of  the  finder  recorder,  annotated  with  document  value 
curve. 

3.  Statistical  analysis  and  visualization  tools:  the  visualization  component,  functional 
models  of  reading  behavior,  API  for  representing  such  functional  models, 
statistical  analysis  algorithms  that  correlate  reading  access  data  (the 
experimentally  recorded  reading  time)  with  functional  models  (document  value 
curve)  to  find  the  cyber  insiders. 
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4.3  Malware  Test  Platform 

One  of  the  main  challenges  in  the  evaluation  of  our  software  tools  is  the  generation  of 
relevant  stealthy  cyber  insider  attacks.  This  is  a  challenge  since  no  details  of  such 
attacks  have  been  published  in  the  open  literature.  Our  approach  to  address  the 
challenge  is  the  development  of  a  test  tool,  the  Malware  Test  Platform  (MTP),  which  is 
capable  of  generating  the  various  combinations  of  colluding  attacks,  including  the  time 
division,  space  division,  and  value  division,  as  well  as  their  combinations. 

In  the  time  division  collusion  attack,  portions  of  a  document  are  read  and  copied  by  a 
cyber  insider  attacker  over  several  sub-sessions.  The  time  division  attack  is  designed  to 
be  effective  against  typical  DLP  tools  since  it  only  takes  a  small  amount  of  data  each 
time,  to  stay  below  the  threshold  of  DLP  detection.  The  MTP  generates  read/copy 
sessions  (by  the  same  insider)  at  increasingly  finer  granularities  to  simulate  time 
division  attack. 

In  the  space  division  collusion  attack,  portions  of  a  document  are  read  and  copied  by 
several  cyber  insider  attackers  in  different  sub-sessions.  The  space  division  attack  is 
also  designed  to  be  effective  against  threshold-based  DLP  tools  for  the  same  reasons 
of  time-division  collusion  attack.  The  MTP  generates  read/copy  sessions  (by  different 
insiders)  at  increasingly  finer  granularities  to  simulate  space  division  attack. 

In  the  value  division  collusion  attack,  only  the  valuable  portions  of  a  document  are  taken 
by  one  or  several  cyber  insider  attackers.  We  assume  the  attacker  does  not  have  the 
capability  to  guess  the  valuable  parts  of  the  document  without  reading  it  first. 
Consequently,  value  division  collusion  attack  includes  sub-sessions  that  read  and  copy 
portions  of  the  document.  The  MTP  generates  read/copy  sessions  of  various  portions 
(by  the  same  or  different  insiders)  to  simulate  value  division  attack  and  the  combination 
of  various  collusion  attacks. 

The  MTP  is  designed  modularly  to  support  other  Generation  2  cyber  insider  attack 
modes  if  conceived. 

4.4  Demonstration  and  Evaluation 

Using  the  MTP,  we  are  able  to  demonstrate  the  effectiveness  of  the  software  tools  in 
monitoring  document  access  and  then  combining  the  various  sub-sessions  for  statistical 
analyses  that  detect  collusions  of  various  forms.  The  initial  statistical  analysis  tools 
simply  combine  the  sub-sessions  for  analysis,  without  concern  about  the  particular 
collusion  approach.  This  exhaustive  search  worked  well  up  to  about  7  sub-sessions, 
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when  the  combinatorial  explosion  caused  the  analysis  to  take  longer  than  an  hour  on  a 
PC. 


To  address  the  scalability  challenge,  we  introduced  several  optimization  techniques, 
e.g.,  eliminating  the  analysis  of  “unnecessary”  combinations  that  have  been  covered  in 
other  combinations.  These  optimization  techniques,  plus  simple  heuristics  such  as 
flagging  of  very  short  sub-sessions  (e.g.,  lasting  only  a  few  seconds)  as  immediately 
“non-human”  access,  make  our  analysis  tools  very  close  to  covering  all  combinations. 

For  Generation  1  cyber  insider  attacks,  the  PPS  system  allows  access  only  through 
authorized  APIs,  which  consists  of  a  single  API:  the  Firefox  plug-in.  All  other  accesses 
are  considered  unauthorized  and  rejected.  Thus  our  system  is  able  to  prevent 
Generation  1  cyber  insider  attacks,  independent  of  any  threshold. 

For  Generation  2  cyber  insider  attacks  that  use  the  Firefox  plug-in  API,  we  monitor  the 
document  access  patterns  at  fine  granularity  (cursor  and  mouse  pointer  positions  at 
intervals  of  a  fraction  of  a  second).  The  access  patterns  are  analyzed  for  collusion.  Our 
evaluation  shows  that  our  monitoring  and  analytical  tools  are  completely  effective  in 
detecting  Generation  2  cyber  insider  attacks  generated  by  MTP.  Details  of  evaluation 
are  discussed  in  Section  4.6  below. 

4.5  Concrete  Observables 

The  project  investigated  primarily  the  three  concrete  observables  listed  in  the  table 
below,  derived  from  the  original  observables  and  metrics.  The  first  concrete  observable 
is  the  document  access  through  the  PPS  API.  Assuming  that  PPS  API  has  not  been 
compromised,  its  use  outside  of  our  browsers  immediately  indicates  cyber  insider 
activity.  The  second  concrete  observable  consists  of  document  access  patterns 
recorded  by  our  browser  plug-ins,  e.g.,  the  Firefox  Javascript  plug-in.  This  observable 
includes  primarily  read  operations  with  associated  metadata  (time,  location).  The 
reading  pattern  indicates  the  reader’s  ability  to  find  high  value  information  in  the 
document.  The  third  concrete  observable  includes  the  amount  of  data  copied  by  the 
reader,  which  indicates  the  actual  reader  interest. 
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Table  2  List  of  Concrete  Observables 


Observable  Applicable  Delivered  Other  Possible 

Dimension(s)  Sensors  Sensors 


Document  Access 

PPS  document 

PPS  tools  and 

Other  storage 

API 

access  API 

access  monitors 

systems  with  access 

detection 

monitors 

Document  Access 

Evasive  cyber 

PPS  document 

Click  level  recorders, 

Patterns 

insider  detection 

access  pattern 

which  would  require 

(Accumulated 

(time  division, 

recorder  for 

significant  post- 

amount  of  time 

space  division) 

various  file  types 

processing 

spent  on  each  part 

and  collusion 

and  their  readers 

of  the  document) 

analysis 

Copied  Content  / 

Clipboard 

OpenDLP  and 

Other  DLP  tools,  if 

information 

operations  that 

PPS  content  copy 

applicable 

copy  document 

recorder  for 

content  for 

editors  of  PPS 

potential 

documents 

exfiltration 

4.6  Concrete  Metrics  for  Evaluation 


Table  3  Concrete  Metrics  for  Evaluation 


Metric 

Area 

Metric 

How  Measured 

Success  Criteria 

Current 
State  of 
the  Art 

Data 

D1 .  Machine 

PPS  data  structure 

Immediate  detection 

N/A 

Access 

PPS  interface 

accessed  through  the 

of  unauthorized 

currently 

Detection 

access 

native  (machine) 

document  access  by 

(effective) 

detection 

interface  instead  of  a 
GUI. 

MTP. 

Data 

D2.  Access 

A  large  number  of 

MTP  accesses  20 

Available 

Access 

volume  (bulk) 

documents  are 

documents  within  a 

in  DLP 

Detection 

(effective) 

detection 

accessed  within  a 
short  period  of  time. 

1-hour  window. 
(Tunable  threshold) 

tools 

Data 

D3.  Access 

Rapid  scan  of 

MTP  accesses  20 

N/A 

Access 

speed 

documents  (too  fast 

documents  within  a 

currently 

Detection 

(effective) 

detection 

for  human  access). 

5-minute  window. 
(Tunable  threshold) 

Data 

D4.  High  value 

Scan  of  documents 

MTP  accesses  K 

N/A 

Approved  for  Public  Release;  Distribution  Unlimited. 
9 


Access 

Detection 

(effective) 

access 

detection 

that  find  high  value 
parts. 

valuable  parts  of  a 
document.  (Tunable 
threshold) 

currently 

Data 

D5.  Access 

Measurement  of 

MTP  access  to  PPS 

Similar  to 

Access 

detection 

response  time  for 

documents  with 

file 

Detection 

overhead 

PPS  data  access 

response  time  under 

system, 

(effic/scal 

a) 

operations 

100ms 

but  N/A 
for  PPS 

Collusion 

Cl .  Time 

Combinatorial 

Detection  of  multiple 

N/A 

Detection 

division 

analysis  of  multiple 

MTP  sessions  that 

currently 

(effective) 

combinatorial 

detection 

sessions  by  single 
insider  accessing 
parts  of  a  document 
sequentially. 

cover  the  entire 
document  (up  to  20 
sessions) 

Collusion 

C2.  Space 

Combinatorial 

Detection  of  multiple 

N/A 

Detection 

division 

analysis  of  multiple 

MTP  sessions  that 

currently 

(effective) 

collusion 

detection 

sessions  by  multiple 
insiders  accessing 
parts  of  a  document 
possibly  in  parallel. 

cover  the  entire 
document  (up  to  20 
sessions) 

Collusion 

C3.  Value- 

Combinatorial 

Detection  of  multiple 

Detection 

driven 

analysis  of  multiple 

MTP  sessions  that 

(effective) 

collusion 

detection 

sessions  by  multiple 
insiders  accessing 
parts  of  a  document  in 
sequence  and/or  in 
parallel. 

cover  the  most 
valuable  parts  of  a 
document  (up  to  20 
sessions) 

Collusion 

C4. 

Combinatorial 

Detection  of  multiple 

N/A 

Detection 

Time/Space 

analysis  of  multiple 

MTP  sessions  that 

currently 

(effective) 

combination 

detection 

sessions  by  multiple 
insiders  accessing 
parts  of  a  document  in 
sequence  and/or  in 
parallel. 

cover  the  entire 
document  (up  to  20 
sessions) 

Collusion 

C5.  Time, 

Combinatorial 

Detection  of  multiple 

N/A 

Detection 

Space,  Value 

analysis  of  multiple 

MTP  sessions  that 

currently 

(effective) 

combination 
collusion 
detection  (T/V, 
S/V,  T/S/V) 

sessions  by  multiple 
insiders  accessing 
parts  of  a  document  in 
sequence  and/or  in 
parallel. 

cover  the  most 
valuable  parts  of  a 
document  (up  to  20 
sessions) 

Collusion 

C6.  Scalability 

Run  the  analysis  for 

Find  maximum 

N/A 

Detection 

of  naive 

all  combinations  of  a 

number  of  sessions 

currently 

(effic/scal 

a) 

combinatorial 

analysis 

set  of  sessions. 

(K)  that  can  be 
combined  in 
exhaustive  analysis 
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Collusion 

Detection 

(effic/scal 

a) 

Cl.  Heuristics 
to  improve 
scalability  of 
collusion 
analysis 

Use  heuristics  to 
improve  detection 
efficiency  (e.g.,  most 
likely  collusions  first) 

Increase  the  number 
of  sessions  that  can 
be  combined  in 
heuristic  analysis  to 

5K 

N/A 

currently 

Collusion 

Detection 

(effic/scal 

a) 

C8.  A  variety 
of  statistical 
correlation 
metrics  and 
aggregation 
methods  for 
collusion 
analysis 

A  number  of  different 
metrics  that  can  be 
used  in  evaluating  the 
likelihood  of  collusion 
to  increase 
robustness  against 
adversarial  attack 
tuning  and 
specification 

At  least  five  separate 
statistical  correlation 
algorithms  are 
integrated  in 
collusion  analysis 
and  combination 
analysis 

PPS 

PI .  Storage 

PPS  support  for 

Demo  of  document 

N/A 

Storage 

System 

(effective) 

and  retrieval  of 
various 
document 
formats 

HTML,  PDF,  Word, 
and  text  formats 

access  for  the  target 
formats;  realistic 
evaluation  tests 
using  MTP 

currently 

PPS 

P2.  PPS  data 

PPS  access  (read  / 

A  PPS  document 

N/A 

Storage 

System 

(efficienc 

y) 

access 

indistinguishab 
le  by  human 
reading 

write)  unnoticeable  at 
human  reading  speed 

accessed  and  read 
via  the  GUI  should 
take  no  longer  than 
twice  load  time  from 
normal  systems 

currently 

PPS 

P3.  Extensible 

PPS  metadata 

PPS  metadata 

N/A 

Storage 

System 

(qualitativ 

e 

scalabilit 

w\ 

and  flexible 

PPS  metadata 
support 

storage  and  query 
supports  the  analyses 
in  Generation  2  and 
beyond 

extensions  will 
support  all  the 
analyses  outlined  in 
this  metrics  table 

currently 

We  acknowledge  that  these  concrete  metrics  are  a  subset  of  and  derived  from  the 
original  observables  and  metrics  developed  in  collaboration  with  the  MIT  Lincoln  Labs 
independent  verification  and  validation  (IV&V)  team.  Furthermore,  we  acknowledge  that 
the  actual  metrics  used  in  the  evaluation  of  delivered  software  tools  (the  Fine  Grain 
Document  Modeling  system)  and  MTP  are  a  subset  of,  and  derived  from  the  concrete 
observables  and  metrics  listed  above.  This  is  due  to  the  instantiation  of  evaluation 
metrics  to  the  specific  and  actual  implementation  of  software  tools  and  MTP,  which 
support  a  substantial  subset  of  these  metrics  and  in  some  cases  more  detailed  metrics 
derived  from  these  listed.  Using  the  derived  actual  metrics,  we  have  demonstrated  the 
effectiveness  and  scalability  of  our  software  tools,  with  positive  conclusions  on  the 
derived  metrics. 
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The  IV&V  team  conducted  independent  tests  on  the  delivered  software  tools  using  the 
derived  actual  metrics  that  form  a  subset  of  concrete  metrics  listed,  as  described  in  their 
report.  Examples  of  metrics  that  were  not  directly  supported  by  MTP  and  so  could  not 
be  tested  include  C6,  C7,  and  C8  (and  their  scalability).  Within  the  context  of  the  actual 
subset  of  metrics  tested  and  verified,  we  quote  the  first  paragraph  of  their  Result 
Summary  (page  2): 

Overall,  the  tests  showed  the  MTP  successfully  produced  the  anomalous  activity  and 
the  reading  patterns  were  captured  by  the  Fine  Grain  Document  Modeling  system.  For 
the  remainder  of  the  paper,  both  the  Fine  Grain  Document  Modeling  system  and  the 
MTP  will  be  called  the  System  Under  Test  (SUT).  The  SUT  was  able  to  simulate  and 
detect  single  and  multiple  agents  accessing  single  and  multiple  documents  in  short 
bursts  or  over  extended  periods  of  time.  The  system  also  detected  access  to  high  value 
portions  of  documents  by  single  and  multiple  agents  during  short  or  prolonged  access, 
as  well  as  access  to  different  portions  of  the  same  document  over  time. 


5  Conclusion 

This  project  built  software  tools  to  store  documents  in  the  PPS  system,  monitor 
accesses  to  PPS  documents,  and  analyze  the  access  patterns  to  find  cyber  insider 
attacks.  We  use  a  simple  API  access  authorization  scheme  to  thwart  Generation  1 
cyber  insider  attacks  (direct  data  transfers).  A  combination  of  fine-grained  access 
monitoring  data  and  statistical  analysis  on  combined  sub-sessions  supports  the 
effective  detection  of  Generation  2  cyber  insider  attacks  (minimized/targeted  data 
transfers  through  legitimate  APIs). 
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6  References 


6.1  Software  tools  and  documents 

VMware  Workstation  server  image  (all  software): 

Available  by  request  to  authors  and  upon  approval  by  sponsor. 

Malware  Test  Platform  (MTP): 

Available  by  request  to  authors  and  upon  approval  by  sponsor. 
Malware  Test  Platform  Documentation: 

Available  by  request  to  authors  and  upon  approval  by  sponsor. 
Test  Bed  Manual: 

Available  by  request  to  authors  and  upon  approval  by  sponsor. 
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Appendix  A:  MTP  Manual 


Getting  started 

In  order  to  begin  using  the  software  that  we  have  developed,  boot  the  VMware  image  in 
VMware  Workstation.  Upon  booting,  the  accounts  that  can  be  used  to  log  in  are  as 
follows 

•  Chris  Grayson  //  Delicious_ham_sandwich 

•  root  //  Delicious_ham_sandwich 

Upon  logging  in,  both  the  MySQL  and  Apache  HTTP  daemons  will  need  to  be  started. 

In  order  to  do  so,  open  up  terminal  and  do  the  following. 

•  su  root;  cd  /etc/init.d;  ,/mysqld  start;  ,/httpd  start;  (if  logged  in  as  Chris  Grayson) 

•  cd  /etc/init.d;  ,/mysqld  start;  ,/httpd  start;  (if  logged  in  as  root) 

Once  all  of  this  is  done  our  software  should  be  ready  to  go. 

Creating  an  account  for  our  PPS  system 

Next,  in  order  to  use  the  malware  test  platform  or  any  of  the  other  tools  we  have 
developed,  you  will  need  to  create  an  account  through  the  web-based  GUI.  To  do  so, 
simply  do  the  following: 

**NOTE**  When  using  the  web  interface  for  any  activity,  it  is  best  to  zoom  it  out  to  the 
proper  zoom  level.  This  is  typically  done  by  holding  CTRL  and  scrolling  the  mouse 
wheel.  The  proper  zoom  level  is  shown  below. 


Welcome  to  the  Infospherc  Research  Project! 


Log  in 

Register 

)(Ul 

IfBftI 

huwil 

haawt 

a*mi] 

tjt-l&K  PlLTAttd 

,aMti 

Figure  1  Screenshot  of  Login  Page 

1 .  Open  up  a  web  browser 

2.  Navigate  to  http://localhost/info2 

3.  Enter  in  all  necessary  credentials  in  the  "Register"  box.  The  registration  code  to 
use  is  "Delicious_ham_sandwich" 

4.  Hit  the  "Submit"  button 

5.  On  the  subsequent  page,  scroll  to  the  bottom  and  click  on  the  "I  Agree"  button 

6.  Upon  hitting  the  "I  Agree"  button  your  account  creation  will  be  confirmed  and  you 
will  be  redirected  to  the  log  in  page. 


Approved  for  Public  Release;  Distribution  Unlimited. 

14 


You  now  have  an  account  with  our  PPS  system.  In  order  to  access  any  other  tools 

through  the  web  GUI,  navigate  to  http://localhost/info2/login.php,  enter  your  credentials, 

and  log  in. 

Uploading  a  document 

To  upload  a  document,  do  the  following: 

1.  Log  in  to  the  web  GUI 

2.  Click  on  the  "Upload  document"  button  in  the  "Upload  a  new  document"  box. 

3.  On  the  subsequent  page,  click  on  the  "Browse..."  button  to  find  the  document  you 
would  like  to  upload  on  your  local. 

4.  Once  you  have  found  and  selected  the  appropriate  document,  you  can  enter  a 
description  of  the  document  in  the  allotted  space. 

5.  Once  you  are  satisfied  with  your  description  (not  having  a  description  is  fine)  click 
on  the  "Upload"  button 

6.  After  an  amount  of  time  proportional  to  the  size  of  the  document  you  should 
receive  a  message  stating  that  your  document  was  successfully  uploaded. 

Viewing  a  document 

To  view  a  document,  do  the  following: 

1.  Log  in  to  the  web  GUI 

2.  In  the  "Read  a  document"  box,  select  the  document  you  would  like  to  read  from 
the  drop  down 

3.  Once  you  have  the  document  selected,  click  on  the  "Start  reading"  button 

4.  Allow  the  document  to  fully  load  before  attempting  to  scroll  /  read 

5.  Once  you  are  done  reading,  click  on  the  "Done  reading"  link  in  the  bottom  right- 
hand  corner  of  the  page 

Annotating  a  document 

To  annotate  a  document,  do  the  following: 

1.  Log  in  to  the  web  GUI 

2.  In  the  "Annotate  and  existing  document"  box,  select  the  document  you  would  like 
to  annotate  from  the  drop  down 

3.  Once  you  have  the  document  selected,  click  on  the  "Start  annotating"  button 

4.  Allow  the  document  to  fully  load  before  attempting  to  scroll  /  read  /  annotate 

5.  To  annotate  a  given  region,  hold  down  the  CTRL  key  and  click  and  drag  the 
resulting  box  over  the  area  of  the  document  you  would  like  to  annotate. 

6.  Once  you  release  the  mouse  key,  the  box  will  become  dark  blue.  At  this  point 
you  can  assign  a  value  to  the  annotated  area  using  the  arrow  keys  in  the  top 
right-hand  corner  of  the  blue  box.  Should  you  want  to  delete  the  box,  click  on  the 
'X'  in  the  top  right-hand  corner  of  the  blue  box. 
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7.  Once  you  have  annotated  all  sections  of  the  document  that  you  would  like,  click 
on  the  "Save  annotation"  link  in  the  bottom  right-hand  comer  of  the  page 


Viewing  your  reading  patterns 

To  view  the  reading  patterns  that  you  have  created  using  the  PPS  GUI  system,  do  the 
following: 

1.  Log  in  to  the  web  GUI 

2.  In  the  "View  document  reading  patterns"  box,  select  "View  own  reading  patterns" 
from  the  drop  down 

3.  Click  on  the  "View  access  patterns"  button 

4.  Once  on  the  reviewing  tool  page,  you  can  select  how  you  would  like  to  view  your 
own  reading  patterns  from  the  drop  down  ("View  individual  patterns"  or  "View 
patterns  joined  by  document") 

Logging  in  to  the  Malware  Test  Platform 

To  log  in  to  the  malware  test  platform,  do  the  following: 

1 .  Create  an  account  with  the  PPS  system  using  the  web  GUI 

2.  Open  the  malware  test  platform  by  double-clicking  the  icon  on  the  desktop 

3.  Log  in  to  the  malware  test  platform  using  the  credentials  you  provided  upon 
registering  via  the  web  GUI 

Deploying  malware  to  the  PPS  data  structure 

To  deploy  malware  to  the  PPS  data  structure,  do  the  following: 

1 .  Log  in  to  the  malware  test  platform 

2.  At  the  main  menu  for  the  MTP,  click  on  "Deploy  malware" 

3.  On  the  deploy  malware  frame,  choose  a  malware  type  that  you  would  like  to 
deploy  from  the  drop  down  list.  It  should  be  noted  that  some  of  the  types  of 
malware  depend  on  there  being  multiple  users  in  the  DB  and/or  annotations  for 
documents. 

4.  Fill  out  the  GUI  components  for  the  type  of  malware  you  would  like  to  deploy 

5.  Click  on  the  "Deploy"  button  a  single  time 

6.  Watch  the  log  to  make  sure  everything  was  deployed  successfully 

Removing  malware  from  the  PPS  data  structure 

To  remove  malware  from  the  PPS  data  structure,  do  the  following: 

1 .  Log  in  to  the  malware  test  platform 

2.  At  the  main  menu  for  the  MTP,  click  on  "View  deployed  malware" 

3.  On  the  monitor  deployed  malware  frame,  select  the  row  in  the  deployed  malware 
table  that  corresponds  to  the  malware  you  would  like  to  remove 

4.  Click  on  the  "Delete  malware"  button 
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5.  Watch  the  log  to  make  sure  everything  was  removed  successfully 

Analyze  PPS  structure  for  malware  activity 

To  analyze  the  PPS  structure  for  malware  activity,  do  the  following: 

1 .  Open  a  browser  and  navigate  to  http://localhost/info2/libraries/analysisPage.php 

2.  Choose  an  analysis  type  corresponding  to  the  mission  and  metrics 
documentation  from  the  drop  down 

3.  Choose  a  sub-type  from  the  second  drop  down 

4.  Fill  out  necessary  criteria  depending  on  the  analysis  you  plan  on  running  (some 
analyses  require  no  additional  input).  Please  note  that  input  is  not  validated  and 
as  such  input  should  be  properly  formed  by  the  user  (this  is  straight-forward,  as 
IDs  are  integers  and  valid  ranges  are  specified). 

5.  Click  on  the  "Scan"  button 

Please  note  that  some  of  the  analyses  will  take  longer  than  others,  and  some  will  run  in 
to  scalability  issues.  For  testing  purposes,  please  evaluate  functionality  and  hold  off  on 
scalability  until  we  provide  an  updated  VM. 


Database  structure 

To  access  the  database,  use  the  user  name  "root"  with  the  password  "new-password". 
The  database  schema  is  as  follows: 
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List  of  Acronyms 

•  API:  Application  Program  Interface,  software  module  access  interface  definition  for 
other  programs  to  use  this  module. 

•  DLP:  Data  Loss  Prevention,  techniques  and  software  tools  to  detect  unauthorized 
data  sharing,  typically  through  messages  sent  over  networks. 

•  GUI:  Graphical  User  Interface,  a  program  used  by  human  users  to  access  software 
tool  functionality. 

•  HTML:  hypertext  markup  language,  widely  used  web  document  format. 

•  HTTP:  a  network  protocol  for  transmitting  messages  over  the  World  Wide  Web. 

•  IPR:  Interim  Program  Review,  project  review  during  the  project  execution. 

•  IRB:  Institutional  Review  Board,  a  board  that  monitors  and  approves  human-related 
research. 

•  MTP:  Malware  Test  Platform,  a  software  tool  that  generates  cyber  insider  attacks  for 
many  documents. 

•  PDF:  Portable  Data  Format,  a  standard  document  representation  format. 

•  PPS:  Partial  Persistent  Sequence,  a  customized  data  structure  to  store  data  and 
metadata  of  documents;  the  current  implementation  uses  a  relational  database 
management  system  (MySQL). 

•  T/V,  S/V,  T/S/V:  combination  of  cyber  insider  collusion  attack  modes:  T=time 
division,  S=space  division,  V=value  based.  TA/  means  time  and  value  combination. 
S/V  means  space  and  value  combination.  T/S/V  means  time,  space,  and  value 
combination. 

•  TXT :  plain  text  file  type  for  storing  documents  in  common  operating  systems  such  as 
Windows. 


Glossary  of  Technical  Terms 

•  Apache:  an  open  source  web  server,  used  in  the  project  to  manage  GUI  and  web 
accesses. 

•  Generation  1 :  cyber  insider  attacks  that  use  direct/bulk  data  transfers. 

•  Generation  2:  cyber  insider  attacks  that  stay  below  certain  thresholds  by  using 
collusion  methods. 

•  MySQL:  an  open  source  relational  data  management  system,  used  in  the  project  to 
implement  PPS. 

•  OpenDLP:  open  source  software  tool  that  implements  many  DLP  functions. 

•  VMware:  a  generic  name  for  the  virtual  machine  environment  provided  by  a  family  of 
commercial  virtual  machine  monitors  produced  by  the  company  of  same  name. 
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